Many of us have probably encountered matrices at some point in math. They're these tables with seemingly byzantine rules for combining them. Take the first element of the first row, first element of the first column, multiply them together, then add, then spin three times fast... And this is to say nothing for the rules of inverting.
How did someone come up with this?
We all probably memorized these rules at some point, but what's really going on underneath it all? How did someone actually come up with these rules and the very idea of matrices? Through this post, we'll aim to answer these questions through deriving matrices ourselves and seeing their underlying intuition.
Representing Functions
Before we jump into matrices, we need to start by understanding functions and how we represent them. Functions can be thought of as rules that take in an input and return an output.
One simple way to represent a function is to have a table that takes every input and displays the function's output. For instance, for a function ff we could write:
xx
f(x)f(x)
1
2
2
4
3
6
This is a perfectly fine representation, although very verbose (the table would include every input and output combination). Can we create a simpler representation?
Yes: f(x)=2xf(x) = 2x. This representation, a polynomial, is not only concise, but powerful.
How? Let's say I have another function g(x)=10 xg(x) = 10x, and someone asks, “what happens if I apply ff, followed by gg?".
Before I had this representation, I would have had to make a full table where I write out f(x), and then apply g(x) on it per input like below:
xx
f(x)f(x)
g(f(x))g(f(x))
1
2
20
2
4
40
3
6
60
Yikes. This is tedious.
But now that we have the polynomial representation, we can just do
g(x)=10 xg(x) = 10x
g(f(x))=10*f(x)g(f(x)) = 10\cdot f(x)
g(f(x))=20 xg(f(x)) = 20x
Using simple steps, we can figure out g(f(x))g(f(x)) and describe the composition function for any arbitrary input. The same is true of g(x)+f(x)g(x) + f(x) etc.
In short, the right representation allows us to quickly understand, combine, and process functions.
Linear Maps
Let's now focus on a specific type of functions that form the foundations of matrices: Linear Maps. Linear Maps are functions that have a few special properties. Here's a simple example of a linear map:
f(x)=3xf(x) = 3x for real numbers xx.
The special properties of these functions are:
f(c*x)=c*f(x)f(c \cdot x) = c \cdot f(x)
You can multiply by a scalar before or after applying the function and get the same result.
Early mathematicians realized that such functions can model a lot of the phenomena that happen in the real world from physics, to statistics. In fact, geometrically, all linear maps can be thought of as rotations and scalings:
Rotations
Functions that rotate shapes around a point.
Scalings
Functions that takes a 2d image and scale their width and/or height.
Representing Linear Maps
Now that we've understood linear maps, how can we represent succinctly in a way that makes them easy to combine and use?
1-Dimension
Let's start with linear maps that take in real numbers and return real numbers (all in 1 dimension - the real line).
(Missing.1)
These linear transforms can always be written as f(x)=mx+bf(x) = mx+b for some mm & bb. Pretty simple!
Notice how points on first number line maps to output number line.
2-Dimensions
But what about in 2 dimensions (i.e. the input is a vector such as [[1],[0]]\textcolor{blue}{\begin{bmatrix}
1 \\ 0 \end{bmatrix}} and the output is also a vector in 2D)?
Let's say f(x)f(x) is a scaling function that stretches its input width by 3x and its height by 5x. You can see it visually below:
The linear map ff scales the purple rectangle into the black rectangle.
Let’s just start by describing what the function does to two very simple vectors [[1],[0]]\textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}} and [[0],[1]]\textcolor{#228B22}{\begin{bmatrix} 0 \\ 1 \end{bmatrix}}.
Based on this, can we figure out f[[10],[9]]f\begin{bmatrix}
\textcolor{blue}{10} \\
\textcolor{#228B22}{9}\end{bmatrix} is? Amazingly, yes!
The below diagram is going to show our approach visually:
We will decompose [[10],[9]]\begin{bmatrix} \textcolor{blue}{10} \\ \textcolor{#228B22}{9}\end{bmatrix} into a combination of [[1],[0]])\textcolor{blue}{\begin{bmatrix} 1 \\ 0\end{bmatrix}}) and [[0],[1]])\textcolor{#228B22}{\begin{bmatrix} 0 \\1\end{bmatrix})} (the two vectors which we know the value of ff for).
More precisely, here's how we find the final value:
So we're able to find f([[10],[9]])f(\begin{bmatrix}\textcolor{blue}{10} \\ \textcolor{#228B22}{9}\end{bmatrix}) by just knowing f([[1],[0]])f(\textcolor{blue}{\begin{bmatrix} 1 \\ 0\end{bmatrix})} and f([[0],[1]])f(\textcolor{#228B22}{\begin{bmatrix} 0 \\ 1\end{bmatrix}}).
In fact this is true more generally - we can represent the function entirely by what it does to the vectors [[1],[0]]\textcolor{blue}{\begin{bmatrix}1 \\0\end{bmatrix}} and [[0],[1]]\textcolor{#228B22}{\begin{bmatrix}0 \\1 \end{bmatrix}} and use that to find all potential outputs!
Matrices
So is there a way I can quickly say what a linear map does to [[1],[0]]\textcolor{blue}{\begin{bmatrix}
1 \\
0
\end{bmatrix}} and [[0],[1]]\textcolor{#228B22}{\begin{bmatrix}
0 \\
1\end{bmatrix}}?
Yes! A simple 2x2 matrix!
f(x)f(x) (as we defined it in the previous section) can be represented by the notation
This is extremely cool - we can describe the entire function and how it operates on an infinite number of points by a little 4 value table.
So just like whenever you see f(x)=3xf(x) = 3x, you immediately know you’re dealing with a function, when you see a matrix, know that you are dealing with a linear map.
Example of Creating a Matrix Representation
Let's quickly see an example of how we find the matrix representation for a linear map. Let's imagine we have the linear map gg that rotates vectors 90º counter clockwise. What would the matrix GG for this function gg be?
Let's as earlier start with just understanding gg on our two simple vectors:
[[1],[0]]\textcolor{blue}{\begin{bmatrix}1 \\ 0\end{bmatrix}} and [[0],[1]]\textcolor{#228B22}{\begin{bmatrix}0 \\ 1 \end{bmatrix}}.
Putting it altogether, we see that GG, the matrix representation of gg, is G=[[0,-1],[1,0]]G = \begin{bmatrix} \textcolor{blue}{0} & \textcolor{#228B22}{-1} \\ \textcolor{blue}{1} & \textcolor{#228B22}{0} \end{bmatrix}.
Applying the Function
Ok so we know we can represent a linear map as a matrix - but how do we apply this function on an input?
Specifically, say I have the same function gg. I want to apply this function on the vector [[x],[y]]\begin{bmatrix} \textcolor{blue}{x} \\ \textcolor{#228B22}{y} \end{bmatrix}.
How do we do it?
Let’s start with what we know:
The first column of the matrix tells us g([[1],[0]])g(\textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}) .
the second column tells us g([[0],[1]])g(\textcolor{#228B22}{\begin{bmatrix} 0 \\ 1 \end{bmatrix}}).
To make use of that, let's break [[x],[y]]\begin{bmatrix} \textcolor{blue}{x} \\ \textcolor{#228B22}{y} \end{bmatrix} into a combination of [[1],[0]]\textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}} and [[0],[1]]\textcolor{#228B22}{\begin{bmatrix}0 \\ 1 \end{bmatrix}}.
Does this look familiar? Because it is exactly what you would get by carrying out the matrix multiplication [[0,-1],[1,0]]*[[x],[y]]\begin{bmatrix} \textcolor{blue}{0} & \textcolor{#228B22}{-1} \\ \textcolor{blue}{1} & \textcolor{#228B22}{0} \end{bmatrix} \cdot \begin{bmatrix} \textcolor{blue}{x} \\ \textcolor{#228B22}{y} \end{bmatrix}!
The matrix multiplication rule is just a shorthand for applying the function on a vector! If we ever want to find the result of rotating some vector vv 90º counterclockwise, we just have to compute GvGv.
Matrix Multiplication as Function Composition
Ok so we’ve seen that multiplying a 2x2 matrix by a vector is just applying the function on the vector. But what does matrix multiplication when we are multiplying two 2x2 matrices?
To answer this, we’re going to take a small detour. We know how to apply a function ff on an input vector vv. What if I ask you to find the composition h=g(f(v))h = g(f(v)) where we know ff and gg?
Let’s work through this now:
Let GG be the matrix for the function gg.
Let FF be the matrix for function ff.
We want to find the matrix HH that represents the function h=g(f(v))h = g(f(v)).
So what’s g(F_(col1))g(\textcolor{blue}{F_{col1}})?
Well we just saw earlier that this is just G*F_(col1)G \cdot \textcolor{blue}{F_{col1}}. So we can now write:
H=[[G*F_(col1),G*F_(col2)]]H = \begin{bmatrix} G \cdot \textcolor{blue}{F_{col1}} & G \cdot \textcolor{#228B22}{F_{col2}} \end{bmatrix}
Seem familiar? Because it’s exactly what the formula for the 2x2 matrix multiplication H=G*FH = G\cdot F!
So HH, the matrix representing g(f(x))g(f(x)), is just G*FG\cdot F.
Thus, matrix multiplication is just a way to compose linear maps.
An Example of Function Composition
Let's work through a full example to see this linear map composition in action.
Let ff be our function as earlier (represented by the matrix F=[[3,0],[0,5]]F = \begin{bmatrix}3 & 0 \\ 0 & 5\end{bmatrix}).
Let gg be the function that rotates a vector 90º counter-clockwise as before (represented by G=[[0,-1],[1,0]]G = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}).
Finding gg Composed With ff
Let's work through h=g@fh = g \circ f applied on the vector v=[[2],[1]]v = \begin{bmatrix} 2 \\ 1\end{bmatrix}. We can use just the properties of ff and gg to figure out this result (see diagram below).
We start with vv.
We know ff stretches its input by 3x3x in the x direction and by 5x5x in the y direction, leading to f(v)=[[6],[5]]f(v) = \begin{bmatrix} 6 \\ 5 \end{bmatrix}.
We then just rotate the resulting vector 90º counter-clockwise to get g(f(v))=[[-5],[6]]g(f(v)) = \begin{bmatrix} -5 \\ 6 \end{bmatrix}.
gg Composed With ff Using Matrices
Do we get the same result when using function composition through matrices? Let's find out:
The matrix for g@fg \circ f is H=G*FH = G \cdot F.
This is exactly what we got when visually applying g(f(v))g(f(v)). So we were able to get the same result without having to apply each function one at a time - we were just able to use matrix multiplication to find out the final function and apply it directly. While this already saves us time, imagine how much time this would save if we had to compose 10 or 20 different functions!
Plug in matrices to explore the transformations they create when applied to the unit square.
(missing.2)
Big Picture: Matrices Give Us Power
Remember how much power we gained by being able to represent a function as a polynomial? We were able to combine polynomials, compose them, multiply them - we now have all that same power for linear maps!
We can compose them (matrix multiplication), add them (matrix addition), and invert them (matrix inversion) all by following some simple rules.
A more sophisticated way of saying this is we have an algebra on these functions (just like we have an algebra on integers etc.).
This is what linear algebra broadly means - the study of how we can combine and compose linear maps just as we combine and compose numbers etc.
Looking Forward
So we’ve seen pretty clearly that matrices are just functions and that linear algebra is the study of combining these functions.
Understanding this helps us perceive some of the key ideas of linear algebra in a new way. In the next post eigenvectors, we’re going to focus on eigenvectors and see how they provide valuable insights into these linear maps.